Software Mining Studies: Goals, Approaches, Artifacts, and Replicability
نویسندگان
چکیده
The mining of software archives has enabled new ways for increasing the productivity in software development: Analyzing software quality, mining project evolution, investigating change patterns and evolution trends, mining models for development processes, developing methods of integrating mined data from various historical sources, or analyzing natural language artifacts in software repositories, are examples of research topics. Software repositories include various data, ranging from source control systems, issue tracking systems, artifact repositories such as requirements, design and architectural documentation, to archived communication between project members. Practitioners and researchers have recognized the potential of mining these sources to support the maintenance of software, to improve their design or architecture, and to empirically validate development techniques or processes. We revisited software mining studies that were published in recent years in the top venues of software engineering, such as ICSE, ESEC/FSE, and MSR. In analyzing these software mining studies, we highlight different viewpoints: pursued goals, state-of-the-art approaches, mined artifacts, and study replicability. To analyze the mining artifacts, we (lexically) analyzed research papers of more than a decade. In terms of replicability we looked at existing work in the field in mining approaches, tools, and platforms. We address issues of replicability and reproducibility to shed light onto challenges for large-scale mining studies that would enable a stronger conclusion stability.
منابع مشابه
Beyond Replication: An example of the potential benefits of replicability in the Mining of Software Repositories Community
While in theory the mining software repositories is an area where replication is easier to perform than for other empirical software engineering fields, a review of papers presented at the Mining Software Repositories Workshop/Working Conference shows that the research studies presented do not satisfy the requirements for easy replication. In this paper, we present some possibilities that repli...
متن کاملPredicting Software Product Quality: A Systematic Mapping Study
Predicting software product quality (SPQ) is becoming a permanent concern during software life cycle phases. In this paper, a systematic mapping study was performed to summarize the existing SPQ prediction (SPQP) approaches in literature and to organize the selected studies according to seven classification criteria: SPQP approaches, research types, empirical types, data sets used in the empiri...
متن کاملTowards a Framework for Empirical Project Analysis for Software Engineering Models
Data collection and analysis is a central issue in empirical software engineering. This is particularly true for automated gathering of data. Also the empirical evaluation of many research approaches requires the use of combination of data sources from various domains. Capturing and combining the spatial and temporal data from heterogeneous sources is a non-trivial and time-consuming task. In t...
متن کاملMapping of McGraw Cycle to RUP Methodology for Secure Software Developing
Designing a secure software is one of the major phases in developing a robust software. The McGraw life cycle, as one of the well-known software security development approaches, implements different touch points as a collection of software security practices. Each touch point includes explicit instructions for applying security in terms of design, coding, measurement, and maintenance of softwar...
متن کاملDeclarative Visitors to Ease Fine-grained Source Code Mining with Full History on Billions of AST Nodes by Robert Dyer, Hridesh Rajan, and Tien N. Nguyen
Software repositories contain a vast wealth of information about software development. Mining these repositories has proven useful for detecting patterns in software development, testing hypotheses for new software engineering approaches, etc. Specifically, mining source code has yielded significant insights into software development artifacts and processes. Unfortunately, mining source code at...
متن کامل